8 research outputs found

    Estimating regional unemployment with mobile network data for Functional Urban Areas in Germany

    Get PDF
    The ongoing growth of cities due to better job opportunities is leading to increased labour-relatedcommuter flows in several countries. On the one hand, an increasing number of people commuteand move to the cities, but on the other hand, the labour market indicates higher unemployment ratesin urban areas than in the surrounding areas. We investigate this phenomenon on regional level byan alternative definition of unemployment rates in which commuting behaviour is integrated. Wecombine data from the Labour Force Survey (LFS) with dynamic mobile network data by small areamodels for the federal state North Rhine-Westphalia in Germany. From a methodical perspective, weuse a transformed Fay-Herriot model with bias correction for the estimation of unemployment ratesand propose a parametric bootstrap for the Mean Squared Error (MSE) estimation that includes thebias correction. The performance of the proposed methodology is evaluated in a case study based onofficial data and in model-based simulations. The results in the application show that unemploymentrates (adjusted by commuters) in German cities are lower than traditional official unemployment ratesindicate

    Estimation of Disaggregated Indicators with Application to the Household Finance and Consumption Survey

    Get PDF
    International institutions and national statistical institutes are increasingly expected to report disaggregated indicators, i.e., means, ratios or Gini coefficients for different regional levels, socio-demographic groups or other subpopulations. These subpopulations are called areas or domains in this thesis. The data sources that are used to estimate these disaggregated indicators are mostly national surveys which may have small sample sizes for the domains of interest. Therefore, direct estimates that are based only on the survey data might be unreliable. To overcome this problem, small area estimation (SAE) methods help to increase the precision of survey-based estimates without demanding larger and more costly surveys. In SAE, the collected survey data is combined with other data sources, e.g., administrative and register data or data that is a by-product of digital activities. The data requirements for various SAE methods depend to a large extent on whether the indicator of interest is a linear or non-linear function of a quantitative variable. For the estimation of linear indicators, e.g., the mean, aggregated data is sufficient, that is, direct estimates and auxiliary information from other data sources only need to be available for each domain. One popular area-level approach in this context is the Fay-Herriot model that is studied in Part 1 of this work. In Chapter 1, the Fay-Herriot model is used to estimate the regional distribution of the mean household net wealth in Germany. The analysis is based on the Household Finance and Consumption Survey (HFCS) that was launched by the European Central bank and several statistical institutes in 2010. The main challenge of applying the Fay-Herriot approach in this context is to handle the issues arising from the data: a) the skewness of the wealth distribution, b) informative weights due to, among others, unit non-response, and c) multiple imputation to deal with item non-response. For the latter, a modified Fay-Herriot model that accounts for the additional uncertainty due to multiple imputation is proposed in this thesis. It is combined with known solutions for the other two issues and applied to estimate mean net wealth at low regional levels. The Deutsche Bundesbank that is responsible for reporting the wealth distribution in Germany, as well as many economic institutes, predominantly work with the statistical software Stata. In order to provide the Fay-Herriot model and its extensions used in Chapter 1, a new Stata command called fayherriot is programmed in the context of this thesis to make the approach available for practitioners. Chapter 2 describes the functionality of the command with an application to income data from the Socio-Economic Panel, one of the largest panel surveys in Germany. The example application demonstrates how the Fay-Herriot approach helps to increase the reliability of estimates for mean household income compared to direct estimates at three different regional levels. In an extension to estimating linear indicators, Part 2 deals with the estimation of non-linear income and wealth indicators. Since the mean is sensitive to outliers, the median and other quantiles are also of interest when estimating the income or wealth distribution. As a first approach, this thesis focuses on the direct estimation of quantiles, which is not as straightforward as for the mean. In Chapter 3, common quantile definitions implemented in standard statistical software are empirically evaluated based on income and wealth distributions with regards to their bias. The analysis shows that, especially for wealth data that is mostly heavily skewed, sample sizes need to be large in order to obtain unbiased direct estimates with the common quantile definitions. Since a design-unbiased direct estimator is one assumption of the aforementioned Fay-Herriot model, further research would be necessary in order to use the Fay-Herriot approach for the estimation of quantiles when the underlying data is heavily skewed. More common methods for producing reliable estimates for non-linear indicators -- including quantiles, poverty indicators, and inequality indicators such as the Gini coefficient -- in small domains are unit-level SAE methods. However, for these methods, the data requirements are more restrictive. Both the survey data and the auxiliary data need to be available for each unit in each domain. Among others, the empirical best prediction (EBP), the World-Bank method, and the M-Quantile approach are well-known methods for the estimation of non-linear indicators in small domains. However, these methods are either not available in statistical software or the user-friendliness is limited. Therefore, in this work the R package emdi is developed that focuses on an user-friendly application of the EBP. Chapter 4 describes how the package emdi supports the user beyond the estimation by tools for assessing and presenting the results. Both, area- and unit-level SAE models, are based on linear mixed regression models that rely on a set of assumptions, particularly the linearity and normality of the error terms. If these assumptions are not fulfilled, transforming the response variable is one possible solution. Therefore, Part 3 provides a guideline for the usage of transformations. Chapter 5 gives an extensive overview of different transformations applicable in linear and linear mixed regression models and discusses practical challenges. The implementation of various transformations and estimation methods for transformation parameters are provided by the R package trafo that is described in Chapter 6. Altogether, this work contributes to the literature by a) combining SAE and multiple imputation proposing a modified Fay-Herriot approach, b) showing limitations of existing quantile definitions with regards to the bias when data is skewed and the sample size is small, c) closing the gap between academic research and practical applications by providing user-friendly software for the estimation of linear and non-linear indicators, and d) giving a framework for the usage of transformations in linear and linear mixed regression models

    Small area estimation in R with application to Mexican income data

    Get PDF
    In the last decades policy decisions are often based on statistical measures. The more detailed this information is, the better is the basis for targeting policies and evaluating policy programs. For instance, the United Nations suggest more disaggregation of statistical indicators for monitoring their Sustainable Development Goals and also the number of National Statistical Institutes (NSIs) that notice the need of more disaggregated statistics is increasing. Dimensions for disaggregation can be characteristics of the individuals or households like sex, age or ethnicity, economic activity or spatial dimensions like metropolitan areas or districts. Primary data sources for variables that are used to estimate statistical indicators are national household surveys. However, sample sizes are usually small or even zero at disaggregated levels. Therefore, direct estimators based only on survey data can be unreliable or not available for small domains. While the option of more specific surveys is costly, model-based methodologies for dealing with small sample sizes can help to obtain reliable estimates for small domains. The so-called Small Area Estimation (SAE) methods [1,2] link survey data that is only available for a proportion of households with administrative or census data available for all households in the area of interest. Even though a wide range of SAE methods is proposed by academic researchers, these are, so far, applied only by a small number of NSIs or other practitioners like the World Bank. This gap between theoretical possibilities and practical application can have several reasons. One reason can be the lack of suitable statistical software. The free software environment R helps to counteract this issue since researchers can make their codes available to the public via packages. Thus, new methods can reach the practitioner faster than with non-free software. The next two sections summarize which packages are already available and what could be improved in the future

    The R Package emdi for Estimating and Mapping Regionally Disaggregated Indicators

    Get PDF
    The R package emdi enables the estimation of regionally disaggregated indicators using small area estimation methods and includes tools for processing, assessing, and presenting the results. The mean of the target variable, the quantiles of its distribution, the headcount ratio, the poverty gap, the Gini coefficient, the quintile share ratio, and customized indicators are estimated using direct and model-based estimation with the empirical best predictor (Molina and Rao 2010). The user is assisted by automatic estimation of datadriven transformation parameters. Parametric and semi-parametric, wild bootstrap for mean squared error estimation are implemented with the latter offering protection against possible misspecification of the error distribution. Tools for (a) customized parallel computing, (b) model diagnostic analyses, (c) creating high quality maps and (d) exporting the results to Excel and OpenDocument Spreadsheets are included. The functionality of the package is illustrated with example data sets for estimating the Gini coefficient and median income for districts in Austria

    The R Package emdi for Estimating and Mapping Regionally Disaggregated Indicators

    Get PDF
    The R package emdi offers a methodological and computational framework for the estimation of regionally disaggregated indicators using small area estimation methods and provides tools for assessing, processing and presenting the results. A range of indicators that includes the mean of the target variable, the quantiles of its distribution and complex, non-linear indicators or customized indicators can be estimated simultaneously using direct estimation and the empirical best predictor (EBP) approach (Molina and Rao 2010). In the application presented in this paper package emdi is used for estimating inequality indicators and the median of the income distributions for small areas in Austria. Because the EBP approach relies on the normality of the mixed model error terms, the user is further assisted by an automatic selection of data-driven transformation parameters. Estimating the uncertainty of small area estimates (using a mean squared error - MSE measure) is achieved by using both parametric bootstrap and semi-parametric wild bootstrap. The additional uncertainty due to the estimation of the transformation parameter is also captured in MSE estimation. The semi-parametric wild bootstrap further protects the user against departures from the assumptions of the mixed model in particular, those of the unit-level error term. The bootstrap schemes are facilitated by computationally effcient code that uses parallel computing. The package supports the users beyond the production of small area estimates. Firstly, tools are provided for exploring the structure of the data and for diagnostic analysis of the model assumptions. Secondly, tools that allow the spatial mapping of the estimates enable the user to create high quality visualizations. Thirdly, results and model summaries can be exported to Excel™ spreadsheets for further reporting purposes

    Switching Between Different Non-Hierachical Administrative Areas via Simulated Geo-Coordinates: A Case Study for Student Residents in Berlin

    No full text
    The transformation of area aggregates between non-hierarchical area systems (administrative areas) is a standard problem in official statistics. For this problem, we present a proposal which is based on kernel density estimates. The approach applies a modification of a stochastic expectation maximization algorithm, which was proposed in the literature for the transformation of totals on rectangular areas to kernel density estimates. As a by-product of the routine, one obtains simulated geo-coordinates for each unit. With the help of these geo-coordinates, it is possible to calculate case numbers for any area system of interest. The proposed method is evaluated in a design-based simulation based on a close-to-reality, simulated data set with known exact geo-coordinates. In the empirical part, the method is applied to student resident figures from Berlin, Germany. These are known only at the level of ZIP codes, but they are needed for smaller administrative planning districts. Results for (a) student concentration areas and (b) temporal changes in the student residential areas between 2005 and 2015 are presented and discussed

    Switching between different non-hierachical administrative areas via simulated geo-coordinates: A case study for student residents in Berlin

    No full text
    The transformation of area aggregates between non-hierarchical area systems (administrative areas) is a standard problem in official statistics. For this problem, we present a proposal which is based on kernel density estimates. The approach applies a modification of a stochastic expectation maximization algorithm, which was proposed in the literature for the transformation of totals on rectangular areas to kernel density estimates. As a by-product of the routine, one obtains simulated geo-coordinates for each unit. With the help of these geo-coordinates, it is possible to calculate case numbers for any area system of interest. The proposed method is evaluated in a design-based simulation based on a close-to-reality, simulated data set with known exact geo-coordinates. In the empirical part, the method is applied to student resident figures from Berlin, Germany. These are known only at the level of ZIP codes, but they are needed for smaller administrative planning districts. Results for (a) student concentration areas and (b) temporal changes in the student residential areas between 2005 and 2015 are presented and discussed.</p
    corecore